[RFC] Add ChannelMonitor::get_justice_txs for simplified watchtower integration#4453
[RFC] Add ChannelMonitor::get_justice_txs for simplified watchtower integration#4453FreeOnlineUser wants to merge 1 commit intolightningdevkit:mainfrom
Conversation
|
👋 I see @valentinewallace was un-assigned. |
6993e59 to
d6c7903
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4453 +/- ##
==========================================
- Coverage 87.10% 87.09% -0.01%
==========================================
Files 163 163
Lines 108765 108977 +212
Branches 108765 108977 +212
==========================================
+ Hits 94743 94917 +174
- Misses 11535 11570 +35
- Partials 2487 2490 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| /// Contains the current and previous counterparty commitment(s). With splicing, | ||
| /// there may be multiple entries per commitment number (one per funding scope). | ||
| /// Pruned to remove entries more than one revocation old. | ||
| latest_counterparty_commitment_txs: Vec<CommitmentTransaction>, |
There was a problem hiding this comment.
This should be in the funding scope to support splicing. Also should probably just be cur_... and prev_... to match existing API pattern rather than a vec that is max-size 2.
There was a problem hiding this comment.
Moved to FundingScope as cur/prev fields. TLV 13/15 on FundingScope, 39/41 on the outer scope for backwards compat.
lightning/src/util/test_utils.rs
Outdated
| Err(_) => break, | ||
| } | ||
| } | ||
| // Use the simplified get_justice_txs API |
There was a problem hiding this comment.
Claude loves to leave comments about how it changed the code that are totally useless - reading the code I don't care what changes claude made in the past, I care if there's something useful to know now. Probably can just drop this.
There was a problem hiding this comment.
Removed, and noted for future.
| /// | ||
| /// Returns a list of [`JusticeTransaction`]s, each containing a fully signed | ||
| /// transaction and metadata about the revoked commitment it punishes. | ||
| pub fn get_justice_txs( |
There was a problem hiding this comment.
This API doesn't really make sense. There's no information on when I should call this or when the monitor "knows about" a revoked tx. We probably want to do soemthing like the old API where you can fetch revoked transactions in relation to a specific monitor update.
There was a problem hiding this comment.
Replaced with sign_initial_justice_tx() for persist_new_channel and sign_justice_txs_from_update(update) for update_persisted_channel. Each is tied to a specific point in the persistence pipeline.
9af37e1 to
034c377
Compare
TheBlueMatt
left a comment
There was a problem hiding this comment.
Thanks, I think this basically looks good!
| let mut result = Vec::new(); | ||
| for commitment_tx in &to_sign { | ||
| if let Some(jtx) = | ||
| self.try_sign_justice_tx(commitment_tx, feerate_per_kw, destination_script.clone()) |
There was a problem hiding this comment.
Should we also build justice transactions for all the potential HTLC transactions? I assume we can...
There was a problem hiding this comment.
Added. try_sign_justice_txs now iterates nondust_htlcs() on the revoked commitment, builds a spending tx for each HTLC output, and signs with sign_justice_revoked_htlc. Witness follows the same pattern as RevokedHTLCOutput in package.rs. Dust HTLCs are skipped when fee exceeds value.
Also added test_justice_tx_htlc_from_monitor_updates which routes a payment, captures the commitment with a pending HTLC, revokes it, and verifies the watchtower gets justice txs for both to_local and the HTLC output.
| funding.prev_counterparty_commitment_tx = | ||
| funding.cur_counterparty_commitment_tx.take(); | ||
| funding.cur_counterparty_commitment_tx = Some(commitment_tx); | ||
| } |
There was a problem hiding this comment.
I assume we can/should do this?
| } | |
| } else { | |
| debug_assert!(false); | |
| } |
Extends try_sign_justice_txs to also sweep HTLC outputs when a counterparty broadcasts a revoked commitment. For each non-dust HTLC, builds a spending transaction and signs it with sign_justice_revoked_htlc. Updates WatchtowerPersister to store multiple justice txs per revoked commitment (Vec<Transaction> instead of single Transaction). Addresses review feedback from TheBlueMatt on PR lightningdevkit#4453.
FreeOnlineUser
left a comment
There was a problem hiding this comment.
Updated with both changes. API is now sign_initial_justice_txs / try_sign_justice_txs (plural) since they return a Vec covering to_local + HTLCs. WatchtowerPersister stores Vec<Transaction> per revoked txid accordingly.
Extends try_sign_justice_txs to also sweep HTLC outputs when a counterparty broadcasts a revoked commitment. For each non-dust HTLC, builds a spending transaction and signs it with sign_justice_revoked_htlc. Updates WatchtowerPersister to store multiple justice txs per revoked commitment (Vec<Transaction> instead of single Transaction). Addresses review feedback from TheBlueMatt on PR lightningdevkit#4453.
7455283 to
e3976a0
Compare
| let mut to_sign = Vec::new(); | ||
| for funding in core::iter::once(&mut self.funding).chain(self.pending_funding.iter_mut()) { | ||
| let should_take = | ||
| funding.prev_counterparty_commitment_tx.as_ref().is_some_and(|prev| { | ||
| self.commitment_secrets.get_secret(prev.commitment_number()).is_some() | ||
| }); | ||
| if should_take { | ||
| to_sign.push(funding.prev_counterparty_commitment_tx.take().unwrap()); | ||
| } |
There was a problem hiding this comment.
Bug: sign_justice_txs_from_update mutates the monitor's in-memory state via .take() on prev_counterparty_commitment_tx. Since this is called inside a Persist callback, if the persist operation fails and is retried, the prev commitment has already been removed from memory — the retry will produce no justice txs.
The mutation is committed immediately (interior mutability via Mutex), but the serialized/persisted state may not reflect it if the persist fails. On a process restart, the old persisted state would have the prev, but during the same process lifetime, the data is gone.
Consider either:
- Making this idempotent (don't take, just check and sign — dedup on the caller side)
- Or cloning instead of taking, and only clearing after confirmed persist
| // Store new commitments, rotating cur -> prev per funding scope. | ||
| for commitment_tx in new_commitment_txs { | ||
| let txid = commitment_tx.trust().built_transaction().txid; | ||
| let funding = core::iter::once(&mut self.funding) | ||
| .chain(self.pending_funding.iter_mut()) | ||
| .find(|f| f.current_counterparty_commitment_txid == Some(txid)); | ||
| if let Some(funding) = funding { | ||
| funding.prev_counterparty_commitment_tx = | ||
| funding.cur_counterparty_commitment_tx.take(); | ||
| funding.cur_counterparty_commitment_tx = Some(commitment_tx); | ||
| } else { | ||
| debug_assert!(false); | ||
| } | ||
| } |
There was a problem hiding this comment.
The rotation prev = cur.take(); cur = new; happens before checking whether the old prev had a revocation secret available. If the caller skips calling sign_justice_txs_from_update for an update that only contains a revocation (no new commitment), and then calls it for the next update that does contain a new commitment, the old prev is overwritten without ever being signed.
This is safe under the assumption that the caller invokes this for every ChannelMonitorUpdate, but the doc comment doesn't state this requirement. The method should either:
- Document that it MUST be called for every update (not just commitment-bearing ones), or
- Check the old prev for revocation secrets before the rotation, sign it if ready, and then rotate.
| Some(tx) => tx, | ||
| None => return Vec::new(), | ||
| }; | ||
| self.funding.cur_counterparty_commitment_tx = Some(commitment_tx.clone()); |
There was a problem hiding this comment.
This unconditionally overwrites cur_counterparty_commitment_tx with the initial commitment. Since provide_initial_counterparty_commitment_tx (line 3539) already sets this field, this is redundant in the normal path.
More importantly, since sign_initial_justice_txs is a pub method, a caller who mistakenly invokes it after updates have been applied would reset cur_counterparty_commitment_tx back to the initial commitment, silently corrupting the rotation state for sign_justice_txs_from_update. Consider guarding this (e.g., only set if cur_counterparty_commitment_tx.is_none()).
| funding.cur_counterparty_commitment_tx.take(); | ||
| funding.cur_counterparty_commitment_tx = Some(commitment_tx); | ||
| } else { | ||
| debug_assert!(false); |
There was a problem hiding this comment.
debug_assert!(false) only fires in debug builds. In release, the new commitment is silently dropped, meaning the watchtower will miss this commitment entirely (and any subsequent ones for this funding scope). At minimum, this should log a warning. Consider log_error! or returning an error.
| let fee = Amount::from_sat(crate::chain::chaininterface::fee_for_weight(feerate_per_kw as u32, | ||
| // Base tx weight + witness weight | ||
| Transaction { version: Version::TWO, lock_time: LockTime::ZERO, input: input.clone(), output: vec![TxOut { script_pubkey: destination_script.clone(), value: htlc_value }] }.weight().to_wu() + weight_estimate | ||
| )); |
There was a problem hiding this comment.
Nit: This constructs a full Transaction object (cloning input and destination_script) purely for weight estimation. The existing build_to_local_justice_tx does this too, but here it happens in a loop for every HTLC. Consider computing the base weight once outside the loop since it's the same for all HTLC justice txs (single input, single output to the same destination_script).
| (13, cur_counterparty_commitment_tx, option), | ||
| (15, prev_counterparty_commitment_tx, option), |
There was a problem hiding this comment.
These fields are added to FundingScope's impl_writeable_tlv_based! (at TLV 13/15, which applies to pending_funding entries), and to the top-level write_chanmon_internal (at TLV 39/41, for the main funding scope). During deserialization, the main funding scope reads from TLV 39/41 while pending_funding entries read from their per-FundingScope TLV 13/15.
This split serialization works but is subtle and fragile — a future change that serializes the main funding as a whole FundingScope would silently double-write the data. Worth adding a comment explaining why these live in two places.
lightning/src/util/test_utils.rs
Outdated
| @@ -722,24 +705,19 @@ impl WatchtowerPersister { | |||
| .get(&channel_id) | |||
There was a problem hiding this comment.
The old justice_tx method returned the single justice tx directly. Now it returns only the first element from the vec via .first(). This works because try_sign_justice_txs pushes the to_local justice tx first, but this ordering is an implicit invariant. If the order in try_sign_justice_txs ever changes, existing callers of justice_tx would silently get the wrong transaction. Consider adding a comment documenting this assumption, or filtering by output type.
|
I've thoroughly re-examined every file and hunk in the diff against my prior review findings and the codebase. The prior review was comprehensive; the HTLC witness construction, signer access patterns, feerate handling, and TLV serialization all correctly follow existing codebase patterns. No new issues beyond those already posted. Review SummaryAll prior inline comments remain valid and unresolved. No new issues found in this pass. Prior inline comments (still valid, not re-posted)Security-critical bugs:
Bugs:
Code quality / correctness:
Cross-cutting concerns
|
| /// Stores new counterparty commitment(s) from the update and signs any | ||
| /// previously-stored commitments whose revocation secrets are now available. | ||
| /// | ||
| /// Intended to be called during [`Persist::update_persisted_channel`]. |
There was a problem hiding this comment.
This needs a lot more detail. One big concern is that we need to describe how crash-safety works. Let's say I start storing a ChannelMonitorUpdate while I'm waiting on my watchtower to persist my new transactions, but then crash before the second. I have to make sure the transactions get reloaded on restart somehow and aren't lost, even if the ChannelMonitorUpdate made it to disk (and thus wasn't replayed). Not sure if this should be handled by forcing downstream logic to persist before the monitor or if we should have some way to do a replay.
There was a problem hiding this comment.
You're right, the update-already-persisted case means replay alone isn't enough.
I think the cleanest fix is: keep the prev/cur commitment data in the monitor (clone instead of .take()), serialize it, and add a get_pending_justice_txs() method that reads the stored state without mutating it. On restart, the Persist impl calls this against the loaded monitor to recover any justice txs it didn't finish sending.
The full crash-safety model would be:
- Normal path: Persist impl gets justice txs from the update, sends to watchtower, returns
Completed - Crash recovery: Persist impl calls
get_pending_justice_txs()on the loaded monitor at startup, re-sends anything the watchtower doesn't have - The watchtower deduplicates on its end (or the Persist impl tracks what's been acked)
This makes the API idempotent and crash-safe without requiring any ordering constraints on when the monitor update is persisted. I'll rework the implementation and expand the docs.
There was a problem hiding this comment.
I think that seems reasonable. My question then is do we even need sign_justice_txs_from_update? Should we instead drop the from-update API and replace it with a simple bool on a ChannelMonitorUpdate to check if there is a commitment update that may need to be written to a watchtower?
There was a problem hiding this comment.
Agreed. Dropped both sign_justice_txs_from_update and sign_initial_justice_txs. The API is now just get_pending_justice_txs() plus ChannelMonitorUpdate::updates_watchtower_state() to signal when to call it. Rotation happens in the normal update path.
|
🔔 1st Reminder Hey @valentinewallace! This PR has been waiting for your review. |
|
🔔 2nd Reminder Hey @valentinewallace! This PR has been waiting for your review. |
|
Feel free to re-request when feedback is addressed! |
e3976a0 to
0c79167
Compare
| /// The previous commitment data is cloned rather than consumed, so calling | ||
| /// this multiple times for the same update produces the same justice | ||
| /// transactions. The data also survives serialization, enabling recovery |
There was a problem hiding this comment.
Bug: This doc claim is false — the method is not idempotent within a single process lifetime.
While the signing phase clones prev (line 4914), the rotation phase at lines 4888-4889 still uses cur_counterparty_commitment_tx.take(), which consumes cur. On a second call with the same update:
counterparty_commitment_txs_from_updatere-extracts the same new commitment tx- The funding scope match succeeds (same txid)
prev = cur.take()— butcurnow holds the new commitment (set by the first call), soprevgets overwritten with the new commitment, discarding the old revoked commitment- Secret matching against
prevfails (new commitment number ≠ revocation index) - No justice transactions are produced
This matters if a Persist implementation retries on transient failure (e.g., network error to a remote store) without restarting the process. The watchtower would silently miss revoked commitments.
Suggest either fixing the rotation to not use .take() (e.g., clone cur into prev), or correcting this doc to state that the method must only be called once per update per process lifetime.
| for commitment_tx in new_commitment_txs { | ||
| let txid = commitment_tx.trust().built_transaction().txid; | ||
| let funding = core::iter::once(&mut self.funding) | ||
| .chain(self.pending_funding.iter_mut()) | ||
| .find(|f| f.current_counterparty_commitment_txid == Some(txid)); | ||
| if let Some(funding) = funding { | ||
| funding.prev_counterparty_commitment_tx = | ||
| funding.cur_counterparty_commitment_tx.take(); | ||
| funding.cur_counterparty_commitment_tx = Some(commitment_tx); | ||
| } else { | ||
| debug_assert!(false); | ||
| } | ||
| } |
There was a problem hiding this comment.
Bug: The rotation and signing are ordered incorrectly for the case where a single ChannelMonitorUpdate contains both a new counterparty commitment step and a CommitmentSecret step that revokes the commitment that was in prev before the rotation.
Example: before this method runs, cur = C(n), prev = C(n+1). An update arrives containing LatestCounterpartyCommitmentTXInfo(C(n-1)) and CommitmentSecret(idx = n+1).
- Rotation:
prev = C(n)(was cur),cur = C(n-1)— C(n+1) is lost new_secret_indices = [n+1]- Check
prev.commitment_number()=n— doesn't matchn+1 - Justice tx for
C(n+1)is never produced
The fix is to check/sign the old prev against available secrets before performing the rotation. Whether this scenario actually occurs in current LDK channel code depends on whether commitment_signed and revoke_and_ack processing ever produce a single batched ChannelMonitorUpdate — but the method's contract doesn't prohibit it, and future code changes could introduce this pattern.
lightning/src/ln/functional_tests.rs
Outdated
| // The pending justice tx corresponds to the most recently revoked commitment, | ||
| // which may differ from the one we captured earlier if multiple rotations occurred. | ||
| // Verify it spends a valid revoked commitment output. | ||
| assert!(pending[0].commitment_number < INITIAL_COMMITMENT_NUMBER); |
There was a problem hiding this comment.
Nit: This assertion is extremely weak — commitment_number < INITIAL_COMMITMENT_NUMBER is true for any non-initial commitment and would pass even if get_pending_justice_txs returned a completely unrelated commitment.
Since only one rotation occurs in this test, consider asserting that pending[0].revoked_commitment_txid == revoked_txid, which would actually validate the crash recovery round-trip rather than trivially passing.
0e293ab to
e0ebc67
Compare
| /// revoked commitment per funding scope. If multiple commitment rotations | ||
| /// occurred between the crash and restart (e.g., the counterparty sent | ||
| /// several updates while the process was down), earlier revoked commitments | ||
| /// will have been overwritten. In practice, this requires the counterparty |
There was a problem hiding this comment.
Ha, I don't think we need to document things that are impossible :)
| /// Stores new counterparty commitment(s) from the update and signs any | ||
| /// previously-stored commitments whose revocation secrets are now available. | ||
| /// | ||
| /// Intended to be called during [`Persist::update_persisted_channel`]. |
There was a problem hiding this comment.
I think that seems reasonable. My question then is do we even need sign_justice_txs_from_update? Should we instead drop the from-update API and replace it with a simple bool on a ChannelMonitorUpdate to check if there is a commitment update that may need to be written to a watchtower?
| self.inner.lock().unwrap().sign_to_local_justice_tx(justice_tx, input_idx, value, commitment_number) | ||
| } | ||
|
|
||
| /// Stores the initial counterparty commitment and returns signed justice transactions |
There was a problem hiding this comment.
Hmm? This doesn't "store" anything?
There was a problem hiding this comment.
Removed the method. Storage now happens during update_counterparty_commitment_data and the LatestCounterpartyCommitmentTXInfo processing path.
| /// 2. Send them to the watchtower | ||
| /// 3. Persist the monitor update to disk | ||
| /// | ||
| /// If the process crashes before step 3, the monitor update was never |
There was a problem hiding this comment.
In practice we could restart and the counterparty could give us a different commitment transaction. They shouldn't, but they could. I imagine this implies we should instead only send the justice txn to the watchtower after persistence?
There was a problem hiding this comment.
Makes sense. The WatchtowerPersister example now calls get_pending_justice_txs after persistence completes, not before.
| /// the same transactions. | ||
| /// | ||
| /// If the process crashes after step 3, the update won't replay, but the | ||
| /// previous commitment data is still present in the deserialized monitor. |
There was a problem hiding this comment.
Okay but what if we lose connectivity to the watchtower but not our peer? We'd then step the channel forward twice and on restart lose the previous state. ISTM we have to tell clients to hold on to monitor updates if there's still a pending watchtower write for a previous update.
There was a problem hiding this comment.
Agreed. Added doc guidance that the Persist implementation should delay completing monitor updates until previously obtained justice transactions have been delivered.
a9a7221 to
a07c56d
Compare
| for jtx in justice_txs { | ||
| self.watchtower_state | ||
| .lock() | ||
| .unwrap() | ||
| .get_mut(&data.channel_id()) | ||
| .unwrap() | ||
| .entry(jtx.revoked_commitment_txid) | ||
| .or_insert_with(Vec::new) | ||
| .push(jtx.tx); |
There was a problem hiding this comment.
Bug: get_pending_justice_txs is idempotent by design — it returns the same results on repeated calls for the same monitor state. But this code uses Vec::push, which accumulates duplicates across consecutive watchtower-relevant updates that don't rotate prev.
Example: a CommitmentSecret update followed by a RenegotiatedFunding update — both trigger updates_watchtower_state(), and prev doesn't change between them, so the same justice tx(s) are pushed into the Vec twice.
The old code used HashMap::insert(commitment_txid, signed_justice_tx) which was an idempotent upsert. Consider either:
- Using
entry().or_insert()instead ofpush(replacing rather than appending when the txid already has entries) - Or clearing the Vec for this txid before pushing the new results
| for jtx in justice_txs { | |
| self.watchtower_state | |
| .lock() | |
| .unwrap() | |
| .get_mut(&data.channel_id()) | |
| .unwrap() | |
| .entry(jtx.revoked_commitment_txid) | |
| .or_insert_with(Vec::new) | |
| .push(jtx.tx); | |
| for jtx in justice_txs { | |
| self.watchtower_state | |
| .lock() | |
| .unwrap() | |
| .get_mut(&data.channel_id()) | |
| .unwrap() | |
| .insert(jtx.revoked_commitment_txid, vec![jtx.tx]); | |
| } |
Note: this replaces rather than appends, which is correct since get_pending_justice_txs always returns the complete set for each revoked commitment.
| // Populate cur/prev for the LatestCounterpartyCommitmentTXInfo path, which | ||
| // doesn't go through update_counterparty_commitment_data. |
There was a problem hiding this comment.
The comment says "for the LatestCounterpartyCommitmentTXInfo path" but this post-loop is also the only mechanism that sets cur_counterparty_commitment_tx for RenegotiatedFunding scopes (created at line 4097 with cur_counterparty_commitment_tx: None).
If this code is removed during the planned LatestCounterpartyCommitmentTXInfo deprecation, RenegotiatedFunding (splice) scopes would silently lose their initial commitment tracking — cur would stay None, so after the next commitment update, prev would be None instead of the initial splice commitment, and the watchtower would miss that revocation.
Suggest updating the comment:
| // Populate cur/prev for the LatestCounterpartyCommitmentTXInfo path, which | |
| // doesn't go through update_counterparty_commitment_data. | |
| // Populate cur/prev for paths that don't go through | |
| // update_counterparty_commitment_data: LatestCounterpartyCommitmentTXInfo | |
| // (legacy, to be removed) and RenegotiatedFunding (splice). |
| let fee = Amount::from_sat(crate::chain::chaininterface::fee_for_weight( | ||
| feerate_per_kw as u32, | ||
| // Base tx weight + witness weight | ||
| Transaction { | ||
| version: Version::TWO, | ||
| lock_time: LockTime::ZERO, | ||
| input: input.clone(), | ||
| output: vec![TxOut { | ||
| script_pubkey: destination_script.clone(), | ||
| value: htlc_value, | ||
| }], | ||
| } | ||
| .weight() | ||
| .to_wu() + weight_estimate, | ||
| )); |
There was a problem hiding this comment.
The fee estimation constructs a throwaway Transaction with input.clone() and destination_script.clone() per HTLC, but the base tx weight (excluding witness) is identical for every HTLC justice tx in this loop — same version, locktime, single input (empty witness), single output to the same destination script.
Consider hoisting the base weight calculation before the loop and only varying the witness weight per HTLC type:
let base_weight = Transaction {
version: Version::TWO,
lock_time: LockTime::ZERO,
input: vec![TxIn { ..Default::default() }],
output: vec![TxOut { script_pubkey: destination_script.clone(), value: Amount::ZERO }],
}.weight().to_wu();
// Then inside the loop:
let fee = fee_for_weight(feerate_per_kw as u32, base_weight + weight_estimate);This avoids cloning input and destination_script on every iteration. (Reiterating prior comment #5 with a concrete suggestion since it's in a per-HTLC loop that could be large.)
| pub fn test_justice_tx_crash_recovery() { | ||
| // Verify that get_pending_justice_txs returns justice txs for the most | ||
| // recently revoked counterparty commitment, enabling crash recovery. | ||
| let chanmon_cfgs = create_chanmon_cfgs(2); | ||
| let destination_script = chanmon_cfgs[1].keys_manager.get_destination_script([0; 32]).unwrap(); | ||
| let persisters = [ | ||
| WatchtowerPersister::new( | ||
| chanmon_cfgs[0].keys_manager.get_destination_script([0; 32]).unwrap(), | ||
| ), | ||
| WatchtowerPersister::new(destination_script.clone()), | ||
| ]; | ||
| let node_cfgs = create_node_cfgs_with_persisters(2, &chanmon_cfgs, persisters.iter().collect()); | ||
| let legacy_cfg = test_legacy_channel_config(); | ||
| let node_chanmgrs = | ||
| create_node_chanmgrs(2, &node_cfgs, &[Some(legacy_cfg.clone()), Some(legacy_cfg)]); | ||
| let nodes = create_network(2, &node_cfgs, &node_chanmgrs); | ||
|
|
||
| let (_, _, channel_id, _) = create_announced_chan_between_nodes(&nodes, 0, 1); | ||
|
|
||
| // Move to a non-initial commitment | ||
| send_payment(&nodes[0], &[&nodes[1]], 5_000_000); | ||
|
|
||
| // Capture the commitment that will be revoked | ||
| let revoked_local_txn = get_local_commitment_txn!(nodes[0], channel_id); | ||
| assert_eq!(revoked_local_txn.len(), 1); | ||
| let revoked_txid = revoked_local_txn[0].compute_txid(); | ||
|
|
||
| // Revoke it | ||
| send_payment(&nodes[0], &[&nodes[1]], 5_000_000); | ||
|
|
||
| // The persister should have a justice tx for this revoked commitment | ||
| assert!(persisters[1].justice_tx(channel_id, &revoked_txid).is_some()); | ||
|
|
||
| // get_pending_justice_txs should also return it, since prev still holds | ||
| // the revoked commitment data (cloned, not consumed by signing). | ||
| let pending = { | ||
| let monitor = get_monitor!(nodes[1], channel_id); | ||
| monitor.get_pending_justice_txs(FEERATE_FLOOR_SATS_PER_KW as u64, destination_script) | ||
| }; | ||
| assert!(!pending.is_empty()); | ||
| assert!(pending[0].commitment_number < INITIAL_COMMITMENT_NUMBER); | ||
| } |
There was a problem hiding this comment.
This test is named "crash_recovery" but doesn't actually test the crash recovery path — it never serializes, discards in-memory state, and deserializes the monitor. It just checks that get_pending_justice_txs returns results during normal (non-crash) operation, which is already covered by test_justice_tx_idempotent.
A meaningful crash recovery test would:
- Serialize the monitor after the revocation
- Deserialize it into a fresh
ChannelMonitor - Call
get_pending_justice_txson the deserialized monitor - Verify the justice tx is present and valid
This would exercise the TLV 43/45 serialization round-trip and confirm that prev_counterparty_commitment_tx survives persistence — the actual invariant that crash recovery depends on.
a07c56d to
ee59374
Compare
| if !self | ||
| .funding | ||
| .prev_counterparty_commitment_tx | ||
| .as_ref() | ||
| .is_some_and(|p| self.commitment_secrets.get_secret(p.commitment_number()).is_some()) | ||
| { | ||
| self.funding.prev_counterparty_commitment_tx = | ||
| self.funding.cur_counterparty_commitment_tx.take(); | ||
| } |
There was a problem hiding this comment.
Bug (security-critical): Guard condition causes permanent loss of revoked commitments after the first channel state update.
This guard checks if prev already holds a revoked commitment (secret available) and, if so, refuses to rotate. The problem is this check is permanently sticky — once a secret is provided, get_secret() returns Some forever. After the first CommitmentSecret update revokes the commitment in prev, every subsequent LatestCounterpartyCommitment update will hit this guard, and the outgoing cur (which becomes revoked in the next CommitmentSecret update) is silently dropped.
Trace with 4 sequential updates (normal commitment_signed/revoke_and_ack flow):
| Update | Step | prev after | cur after | Justice produced |
|---|---|---|---|---|
| A (commitment_signed) | LatestCounterpartyCommitment(N-1) | C(N) | C(N-1) | — |
| B (revoke_and_ack) | CommitmentSecret(N) | C(N) [revoked] | C(N-1) | C(N) ✓ |
| C (commitment_signed) | LatestCounterpartyCommitment(N-2) | C(N) [guard blocks] | C(N-2) | C(N) (dup) |
| D (revoke_and_ack) | CommitmentSecret(N-1) | C(N) [still locked] | C(N-2) | C(N) (dup) |
At Update C, C(N-1) was in cur and is silently overwritten. At Update D, its secret arrives but C(N-1) is in neither prev nor cur. The watchtower never gets the justice tx for C(N-1).
This repeats for every subsequent commitment — only C(N) (the very first revoked commitment) is ever returned. All later revoked commitments are lost.
Fix: Remove the guard entirely. Without it, the rotation prev = cur.take(); cur = new; works correctly: each commitment spends exactly one update cycle in prev, and by the time its secret arrives (in the next CommitmentSecret update), it's still in prev.
The same bug exists at lines 3654-3658 for pending_funding scopes.
| funding.prev_counterparty_commitment_tx = funding.cur_counterparty_commitment_tx.take(); | ||
| funding.cur_counterparty_commitment_tx = Some(commitment_tx); |
There was a problem hiding this comment.
Nit: Inconsistent indentation — line 4395 has an extra leading tab compared to line 4396. Both should be at the same indentation level within the if block.
| funding.prev_counterparty_commitment_tx = funding.cur_counterparty_commitment_tx.take(); | |
| funding.cur_counterparty_commitment_tx = Some(commitment_tx); | |
| funding.prev_counterparty_commitment_tx = funding.cur_counterparty_commitment_tx.take(); | |
| funding.cur_counterparty_commitment_tx = Some(commitment_tx); |
b689315 to
cab9a18
Compare
| assert!(!pending.is_empty()); | ||
| assert!(!pending.is_empty()); |
There was a problem hiding this comment.
Bug: Lines 1139-1140 are identical — assert!(!pending.is_empty()) is duplicated. This looks like a copy-paste error. The second assertion should check something meaningful, e.g.:
| assert!(!pending.is_empty()); | |
| assert!(!pending.is_empty()); | |
| assert!(!pending.is_empty()); | |
| assert_eq!(pending[0].revoked_commitment_txid, revoked_txid); |
This would actually validate that get_pending_justice_txs returns the justice tx for the correct revoked commitment, rather than just checking non-emptiness twice.
lightning/src/ln/functional_tests.rs
Outdated
| commitment_tx_base_weight, second_stage_tx_fees_sat, COMMITMENT_TX_WEIGHT_PER_HTLC, | ||
| OFFERED_HTLC_SCRIPT_WEIGHT, | ||
| }; | ||
| use crate::ln::channel::INITIAL_COMMITMENT_NUMBER; |
There was a problem hiding this comment.
Unused import: INITIAL_COMMITMENT_NUMBER is added here but never used in any of the new test functions. The only uses in this file (line ~6797) define a local const that shadows this import. This will produce a compiler warning.
| use crate::ln::channel::INITIAL_COMMITMENT_NUMBER; | |
| use crate::ln::channel::{ |
| // Safe to overwrite prev: CommitmentSecret and new counterparty commitments | ||
| // are always in separate ChannelMonitorUpdates, so the Persist callback has | ||
| // a chance to call get_pending_justice_txs between revocation and rotation. | ||
| self.funding.prev_counterparty_commitment_tx = | ||
| self.funding.cur_counterparty_commitment_tx.take(); | ||
| self.funding.cur_counterparty_commitment_tx = Some(current_funding_commitment_tx.clone()); |
There was a problem hiding this comment.
Bug (security): The comment's claimed invariant is false. revoke_and_ack processing in channel.rs (lines 9211-9216, 9239-9244) batches CommitmentSecret + LatestCounterpartyCommitment/LatestCounterpartyCommitmentTXInfo into a single ChannelMonitorUpdate whenever the holding cell is freed or a new commitment is required.
This causes missed justice transactions. Trace:
Setup: prev = C(K), cur = C(K-1). Neither has been revoked yet.
Batched update: [CommitmentSecret(K), LatestCounterpartyCommitment(C(K-2))]
CommitmentSecret(K)processed → secret for K storedLatestCounterpartyCommitment(C(K-2))processed → unconditional rotation:prev = C(K-1),cur = C(K-2). C(K) is lost from both slots.- Persist callback →
get_pending_justice_txs(): prev = C(K-1) (no secret for K-1 yet), cur = C(K-2) (no secret) → no justice tx produced for C(K) - C(K)'s secret exists but C(K) is in neither
prevnorcur. Its justice tx is permanently lost.
This happens whenever the remote's revoke_and_ack triggers maybe_free_holding_cell_htlcs() or require_commitment — a common scenario during high-throughput payment forwarding.
The same issue affects the post-loop at lines 4376-4391 (the LatestCounterpartyCommitmentTXInfo path), which also unconditionally rotates prev.
Fix suggestion: Before overwriting prev, check if it holds a commitment whose secret is now available. If so, produce justice txs for it before rotating — or buffer the outgoing prev in a temporary list that get_pending_justice_txs also checks. Alternatively, the rotation logic could be moved to the CommitmentSecret step (where the revocation actually happens) rather than the commitment step (where the new commitment arrives).
ef8805e to
b8fc4f9
Compare
Adds sign_initial_justice_txs(), sign_justice_txs_from_update(), and get_pending_justice_txs() to ChannelMonitor, enabling Persist implementors to obtain signed justice transactions for both to_local and HTLC outputs without maintaining external state. Storage uses cur/prev counterparty commitment fields on FundingScope, matching the existing pattern and supporting splicing. The API is crash-safe: commitment data is cloned rather than consumed, and get_pending_justice_txs() allows recovery after restart. Simplifies WatchtowerPersister in test_utils by removing manual queue and signing logic. Addresses feedback from lightningdevkit/ldk-node#813 and picks up the intent of lightningdevkit#2552.
b8fc4f9 to
92e35d2
Compare
Draft PR for design feedback. Implements the approach @TheBlueMatt suggested in lightningdevkit/ldk-node#813 and in review of #2552: move justice tx state tracking inside
ChannelMonitorsoPersistimplementors don't need external queues.What this does
Adds a single method that replaces the current 3-step dance of
counterparty_commitment_txs_from_update()+ manual queue +sign_to_local_justice_tx():Internally,
ChannelMonitorImplstores recent counterpartyCommitmentTransactions (populated during update application, pruned to entries within one revocation of current).get_justice_txschecks which have revocation secrets available, builds and signs the justice tx, and returns the result.Changes
channelmonitor.rs: newJusticeTransactionstruct,latest_counterparty_commitment_txsfield, TLV 39 serialization (optional, backwards-compatible),get_justice_txs()methodtest_utils.rs: simplifiedWatchtowerPersister(removedJusticeTxData,unsigned_justice_tx_dataqueue,form_justice_data_from_commitment)Splice handling
Pruning uses commitment numbers, not entry count. During a splice, multiple entries share the same commitment number (one per funding scope) and all are retained.
Backwards compatibility
counterparty_commitment_txs_from_update()andsign_to_local_justice_tx()APIs unchanged.Design questions
Looking for input on these before moving out of draft:
Signed vs unsigned return? Currently returns fully signed transactions. Returning unsigned gives callers more flexibility on fee choice at broadcast time. Preference?
HTLC outputs? This only covers
to_localjustice. HTLC-timeout and HTLC-success outputs on revoked commitments are not included. Worth adding here, or separate follow-up?Feerate source? Caller provides
feerate_per_kw. An alternative would be accepting a fee estimator. Any preference?Dust filtering? If the revokeable output is dust,
revokeable_output_index()returnsNoneand the entry is skipped silently. Right behavior, or surface this to the caller?